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Indexin g values of time sequences 

Ling Lin, Tore Risch, Martin Skold, Dushan Badal 

November 1996 Proceedings of the fifth international conference on Information and 
knowledge management CIKM '96 

Publisher: ACM Press 

Full text available: ^pdf (802.28 KB ) Additional Information: full citation , references , citings, index terms 



Integratin g and customizin g heterogeneous e-commerce applications 
Anat Eyal, Tova Milo 

August 2001 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 10 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf (286.63 KB ) Additional Information: full citation , abstract , citings, index terms 

A broad spectrum of electronic commerce applications is currently available on the Web, 
providing services In almost any area one can think of. As the number and variety of such 
applications grow, more business opportunities emerge for providing new services based 
on the integration and customization of existing applications. (Web shopping malls and 
support for comparative shopping are just a couple of examples.) Unfortunately, the 
diversity of applications in each specific domain and the dispar ... 

Keywords: Application Integration, Data integration, Electronic commerce 



Securin g Name Servers on UNIX 
Nalneesh Gaur 

December 1999 Linux Journal 

Publisher: Specialized Systems Consultants, Inc. 

Full text available: g) html(1 4.66 KB ) Additional Information: fu ll citation , abstra ct, references, indexjerms 

Because the DNS plays such a vital role in the Internet, it is Important that this service be 
protected and secured 

D ynamic expression trees and their a p plications 
Robert F. Cohen, Roberto Tamassia 

March 1991 Proceedings of the second annual ACM^SIAM symposium on Discrete 
algorithms SODA '91 

Publisher: Society for Industrial and Applied Mathematics 

Full text available: ^ pdf(934.44 KB ) Additional Information: full citatio n, references , citings, ind ex terms 
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Rank/select operations on large alphabets: a tool for text indexin g 
Alexander Golynski, J. Ian Munro, S. Srinivasa Rao 

January 2006 Proceedings of the seventeenth annual ACM-SIAM symposium on 
Discrete algorithm SODA '06 

Publisher: ACM Press 



Full text available: "gMUfilJOLKB). 



Additional Information: full citation , abstract , references , cited b v. index 
terms 



We consider a generalization of the problem of supporting rank and select queries on 
binary strings. Given a string of length n from an alphabet of size o, we give the first 
representation that supports rank and access operations in 0(lg Ig o) time, and select in O 
(1) time while using the optimal /? Ig a + o(n Ig a) bits. The best known previous structure 
for this problem required 0(lg a) time, for ... 

Scalin g u p the semantic web: On labelin g schemes for the semantic web | 

Vassllls Christophides, Dimitris Plexousakis, Michel Scholl, Sotirios Tourtounis 

May 2003 Proceedings of the 12th international conference on World Wide Web 

WWW '03 
Publisher: ACM Press 

I- II * ^ I ui A ^f,nnA oo i^D\ Additional Information: full citation , a bstract , references, citings, index 
Full text available: ^ pdf (294.32 KB ) ti^ms 

This paper focuses on the optimization of the navigation through, voluminous subsumption 
hierarchies of topics employed by Portal Catalogs like Netscape Open Directory (ODP). We 
advocate for the use of labeling schemes for modeling these hierarchies in order to 
efficiently answer queries such as subsumption check, descendants, ancestors or nearest 
common ancestor, which usually require costly transitive closure computations. We first 
give a qualitative comparison of three main families of schemes ... 

Searchin g in metric spaces 

Edgar Chavez, Gonzalo Navarro, Ricardo Baeza-Yates, Jose Luis Marroqum 
September 2001 ACM Computing Surveys (CSUR), volume 33 issue 3 
Publisher: ACM Press 

r- II * ^ 1 ui rfPi ^^/o-. c nA Additional Information: full citation , abstract , references , citings, index 
Full text available: TO pdf(916.04 KB) 

The problem of searching the elements of a set that are close to a given query element 
under some similarity criterion has a vast number of applications in many branches of 
computer science, from pattern recognition to textual and multimedia information 
retrieval. We are interested in the rather general case where the similarity criterion 
defines a metric space, instead of the more restricted case of a vector space. Many 
solutions have been proposed in different areas, in many cases without cros ... 

Keywords: Curse of dimensionality, nearest neighbors, similarity searching, vector 
spaces 



^ A methodolo g y for creatin g user views in database desi gn 
Veda C. Storey, Robert C. Goldstein 

September 1988 ACM Transactions on Database Systems (TODS), volume i3 issue 3 
Publisher: ACM Press 

.- MX ^ -. u. 01 ^tin KAox Additional Information: full citation , abstract , references , citings, index 
Full text available:™ pdf( 2.41 MB) ^ : - 

^ terms, review 

The View Creation System (VCS) is an expert system that engages a user in a dialogue 
about the information requirements for some application, develops an Entity-Relationship 
model for the user's database view, and then converts the E-R model to a set of Fourth 
Normal Form relations. This paper describes the knowledge base of VCS. That is, it 
presents a formal methodology, capable of mechanization as a computer program, for 
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^ Accurate and efficient predicate analysis with binary decision dia g rams 
^ John W. Sias, Wen-Mei W. Hwu, David I. August 

n/ December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium 
on Microarchitecture MICRO 33 

Publisher: ACIVI Press 

Full text available: pdf (336.92 KB) 

ps(6 25 MB) ^ Additional Information: full citation , references , citings, index terms 
Publislier Site 



10 Time series similarity measures (tutorial PM-2) 
Dimitrios Gunopulos, Gautam Das 

August 2000 Tutorial notes of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '00 

Publisher: ACM Press 

Full text available: 'g pdf(1.42 MB) Additional Information: full citation , references , citings, index terms 




WebView materialization 
Alexandres Labrinidis, Nick Roussopoulos 

May 2000 ACM SIGMOD Record , Proceedings of the 2000 ACM SIGMOD international 

conference on Management of data SIGMOD '00, volume 29 issue 2 
Publisher: ACM Press 

,. * ^ -. w. ^*/-inc ^a i^dv Additional Information: full citation , abstract , references . cUings, index 
Full text available: TO pdf( 195.16 KB ) 

terms 

A WebView is a web page automatically created from base data typically stored in a 
DBMS. Given the multi-tiered architecture behind database-backed web servers, we have 
the option of materializing a WebView inside the DBMS, at the web server, or not at all, 
always computing it on the fly (virtual). Since WebViews must be up to date, materialized 
WebViews are immediately refreshed with every update on the base data. In this paper 
we compare the three materialization policies (materializ ... 

''^ S ystem Administration 
Marjorie Richardson 
December 1999 Linux Journal 

Publisher: Specialized Systems Consultants, Inc. 

Full text available: g) html(4.25 KB ) Additional Information: full citation , index terms 



13 Nearest nei g hbor queries in metric s paces 

# Kenneth L. Clarkson 
May 1997 Proceedings of tlie twenty-nintli annual ACM symposium on Theory of 
computing STOC '97 

Publisher: ACM Press 

Full text available: ^pdf( 1.31 MB ) Additional Information: full citation , references , citings. Lndex terms 



14 Security issues with TCP/IP 
Renqi Li, E. A. Unger 

June 1995 ACM SIGAPP Applied Computing Review, volume 3 issue i 
Publisher: ACM Press 

Full text available: g pdf(801.12 KB) Additional Information: full citation , abstract , index terms 
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An introduction to networl< security , basic definitions and aa brief discussion of tfie 
architecture of TCP/IP as well as the Open System Intercornnectlon(OSI) Reference Model 
open the paper. The relationship between TCP/IP and of some OSI layers is described. An 
indepth look is provided to the major protocols in TCP/IP suite and the security features 
and problems In this suite of protocols. The secutiy problems are discussed in the context 
ofthe protocol services. 

Keywords: TCP/IP, Unix, network security, security 



15 An experimental active memorv based I/O subsystem Q 



We describe an I/O subsystem based on an active memory called SWIM, designed for 
efficient storage and manipulation of data structures. The key architectural idea in SWIM 
is to put some processing logic inside each memory chip that allows it to perform data 
manipulation operations locally and to communicate with a disk or a communication line 
through a backend port. The processing logic is specially designed to perform operations 
such as pointer dereferencing, memory indirection, searching and b ... 

16 Lineag e retrieval for scientific data processin g : a surve y Q 
^ Rajendra Bose, James Frew 

March 2005 ACM Computing Surveys (CSUR), volume 37 issue i 

Publisher: ACM Press 



Scientific research relies as much on the dissemination and exchange of data sets as on 
the publication of conclusions. Accurately tracking the lineage (origin and subsequent 
processing history) of scientific data sets is thus imperative for the complete 
documentation of scientific work. Researchers are effectively prevented from determining, 
preserving, or providing the lineage of the computational data products they use and 
create, however, because ofthe lack of a definitive model for lineage ... 

Keywords: Data lineage, audit, data provenance, scientific data, scientific workflow 

17 Obj ect-based and ima g e-based ob ject re presentations Q 
^ Hanan Samet 

>/ June 2004 ACM Computing Surveys (CSUR), volume 36 issue 2 
Publisher: ACM Press 

Full text available: ^pdf( 1,05 MB) Additional Information: full citation , abstract , references , index terms 

An overview is presented of object-based and image-based representations of objects by 
their interiors. The representations are distinguished by the manner in which they can be 
used to answer two fundamental queries in database applications: (1) Feature query: 
given an object, determine its constituent cells (I.e., their locations in space). (2) Location 
query: given a cell (i.e., a location in space), determine the identity ofthe object (or 
objects) of which it is a member as well as the re ... 

Keywords: Access methods, R-trees, feature query, geographic information systems 
(GIS), image space, location query, object space, octrees, pyramids, quadtrees, space- 
filling curves, spatial databases 
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Venugopalan Ramasubramanian, Emin Gun Sirer 




September 1994 ACM SIGARCH Computer Architecture News, volume 22 issue 4 
Publisher: ACM Press 

Full text available: Qpdf( 577.76 KB ) Additional Information: full citation , abstract , index terms 



Full text available: ^pdf (728.75 KB ) 
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August 2004 ACM SIGCOMM Computer Communication Review , Proceedings of the 
2004 conference on Applications, technologies, architectures, and 
protocols for computer communications SIGCOMM '04, volume 34 issue 4 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citing s, index 



Full text available: 'g pdf(472.93 KB) 



terms 



Name services are critical for mapping logical resource names to physical resources in 
large-scale. distributed systems. The Domain Name System (DNS) used on the Internet, 
however, is slow, vulnerable to denial of service attacks, and does not support fast 
updates. These problems stem fundamentally from the structure of the legacy DNS.This 
paper describes the design and implementation of the Cooperative Domain Name System 
(CoDoNS), a novel name service, which provides high lookup performance thro ... 

Keywords: DNS, peer to peer, proactive caching 
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1 Integ ratin g and customizin g hetero g eneous e-commerce a p plications 
Anat Eyal, Tova Milo 

August 2001 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 10 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^pdf( 286.63 KB) Additional Information: full citation , abstract , citings, index terms 

A broad spectrum of electronic commerce applications is currently available on the Web, 
providing services in almost any area one can think of. As the number and variety of such 
applications grow, more business opportunities emerge for providing new services based 
on the integration and customization of existing applications. (Web shopping malls and 
support for comparative shopping are just a couple of examples.) Unfortunately, the 
diversity of applications in each specific domain and the dispar ... 



Keywords: Application integration, Data Integration, Electronic commerce 



Scaling u p the semantic web: On labeli ng schennes for the semantic w eb Q 
Vassilis Christophides, Dimitris Plexousakis, Michel Scholl, Sotirios Tourtounis 
May 2003 Proceedings of the 12th international conference on World Wide Web 

WWW '03 
Publisher: /KCM Press 

.- .. * ^ -I ui 0* ^tI^f^A oo WD\ Additional Information: full citation , abstract , references , citings, index 

Full text available: IS pdf 294.32 KB) ^ ^ 

terms 

This paper focuses on the optimization of the navigation through voluminous subsumption 
hierarchies of topics employed by Portal Catalogs like Netscape Open Directory (ODP). We 
advocate for the use of labeling schemes for modeling these hierarchies in order to 
efficiently answer queries such as subsumption check, descendants, ancestors or nearest 
common ancestor, which usually require costly transitive closure computations. We first 
give a qualitative comparison of three main families of schemes ... 

3 BPF-^: exploitin g G lobal data-flow optimization in a g eneralized packet filt er Q 
architecture 

Andrew Begel, Steven McCanne, Susan L Graham 

August 1999 ACM SIGCOMM Computer Communication Review , Proceedings of the 
conference on Applications, technologies, architectures, and protocols 
for computer communication SIGCOMM '99i volume 29 issue 4 
Publisher: ACM Press 

.- .. » ^ -■ UI A cc fciiDx Additional Information: full citation, abstra ct, references , citings, index 
Full text available: TO pdf (1 .55 MB) 

A packet filter is a programmable selection criterion for classifying or selecting packets 
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from a packet stream in a generic, reusable fashion. Previous work on packet filters falls 
roughly into two categories, namely those efforts that investigate flexible and extensible 
filter abstractions but sacrifice performance, and those that focus on low-level, optimized 
filtering representations but sacrifice flexibility. Applications like network monitoring and 
intrusion detection, however, requ ... 

4 Linea g e retrieval for scientific data processin g : a surve y Q 
Rajendra Bose, James Frew 

March 2005 ACM Computing Surveys (CSUR), volume 37 issue i 
Publisher: ACM Press 

Full text available- 151 Ddf(Z2875KB) Information: full citation , abstract, references .- dtings. index 

. j^-n^i terms 

Scientific research relies as much on the dissemination and exchange of data sets as on 
the publication of conclusions. Accurately tracking the lineage (origin and subsequent 
processing history) of scientific data sets is thus imperative for the complete 
documentation of scientific work. Researchers are effectively prevented from determining, 
preserving, or providing the lineage of the computational data products they use and 
create, however, because of the lack of a definitive model for lineage ... 

Keywords: Data lineage, audit, data provenance, scientific data, scientific workflow 

Literature-based discoverv on the World Wide Web Q 
Michael Gordon, Robert K. Lindsay, Weiguo Fan 

November 2002 ACM Transactions on Internet Technology (TOIT), volume 2 issue 4 
Publisher: ACM Press 

.. * ^ I i.1 01 CO u^o\ Additional Information: full citation , abstrac t, references , citings, Index 
Full text available: ^pdf( 119.62 KB ) ^^^^ 

Previous research has shown that researchers can generate medical. hypotheses by using 
computers to analyze several, seemingly unrelated, medical literatures. In this worl< we 
suggest broader application for the ideas of literature-based discovery. Specifically, we 
suggest that literature-based discovery can be fruitful in areas other than medicine; that 
in addition to finding "cures" for "problems," literature-based discovery offers the 
possibility of finding new problems for existing technologie ... 

Keywords: Literature-based discovery 

Accurate and efficient predicate analysis with binary decision dia grams 
John W. Sias, Wen-Mel W. Hwu, David I. August 

December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium 

on Microarchitecture MICRO 33 
Publisher: ACM Press 
Full text available: g pdf(336.92 KB ) 

g] ps(6 25 MB) ^ Additional Information: full citation , references , citings, index terms 
Publisher Site 
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Michael Brady 

March 1982 ACM Computing Surveys (CSUR), volume 14 issue i 
Publisher: ACM Press 
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Alexander Golynski, J. Ian Munro, S. Srinivasa Rao 

January 2006 Proceedings of the seventeenth annual ACM-SIAM symposium on 
Discrete algorithm SODA '06 

Publisher: ACM Press 

Full text available: fg|pdf(161 02 KB) Additional Information: MdtatLon. abstract, references, citeiby.. indejc 
^ terms 

We consider a generalization of the problem of supporting rank and select queries on 
binary strings. Given a string of length n from an alphabet of size a, we give the first 
representation that supports rank and access operations in 0(lg Ig a) time, and select in O 
(1) time while using the optimal n Ig a + o{n Ig a) bits. The best known previous structure 
for this problem required 0(lg a) time, for ... 

Desig nin g and Optimizin g a Scalable CORBA Notification Service Q 
Pradeep Gore, Ron Cytron, Douglas Schmidt, Carlos O'Ryan 

August 2001 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN workshop on 
Languages, compilers and tools for embedded systems LCTES '01 , 
Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of 
middleware and distributed systems OM '01, volume 36 issue 8 
Publisher: ACM Press 

^ I A ^f/o^-r ixDK Additional Information: full citation , abstract , references , citings, index 
Full text available: '^ Ddf(247.10 KB) 

Many distributed applications require a scalable event-driven communication model that 
decouples suppliers from consumers and simultaneously supports advanced quality of 
service (QoS) properties and event filtering mechanisms. The CORBA Notification Service 
provides a publish/subscribe mechanism that is designed to support scalable event-driven 
communication by routing events efficiently between many suppliers and consumers, 
enforcing various QoS properties (such as reliability, priority, orderi ... 

10 Using annotations to reduce dynamic o pti mizatio n time Q 
^ Chandra Krintz, Brad Calder 

^ May 2001 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2001 conference 
on Programming language design and implementation PLDI '01, volume 36 
Issue 5 
Publisher: ACM Press 

^ '. u. 01 70 ^ilD^ Additional Information: full citation , abstract, references , citings, index 
Full text available: 'gpdf (1.78 MB ) 

Dynamic compilation and optimization are widely used in heterogenous computing 
environments, in which an intermediate form of the code is compiled to native code 
during execution. An important trade off exists between the amount of time spent 
dynamically optimizing the program and the running time of the program. The time to 
perform dynamic optimizations can cause significant delays during execution and also 
prohibit performance gains that result from more complex optimization. 
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Alexandros Labrinidis, Nick Roussopoulos 

May 2000 ACM SIGMOD Record , Proceedings of the 2000 ACM SIGMOD international 

conference on Management of data SIGMOD '00, volume 29 issue 2 
Publisher: ACM Press 
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Full text available: 1yi pdf(195.16 KB) 

terms 

A WebView is a web page automatically created from base data typically stored in a 
DBMS. Given the multi-tiered architecture behind database-backed web servers, we have 
the option of materializing a WebView inside the DBMS, at the web server, or not at all, 
always computing it on the fly (virtual). Since WebViews must be up to date, materialized 
WebViews are immediately refreshed with every update on the base data. In this paper 
we compare the three materialization policies (materlaliz ... 
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13 Measurement tools: Introducin g scalability in network measurement: toward 10 Gbps I I 
with commodity hardware 
Loris Degloanni, Gianluca VarennI 

October 2004 Proceedings of the 4th ACM SIGCOMM conference on Internet 

measurement IMC '04 
Publisher: ACM Press 

Full text available: ^ pdf(558.01 KB) Additional Information: full citation , abstract , references , index terms 

The capacity of today's network links, along with the heterogeneity of their traffic, is 
rapidly growing, more than the workstation's processing power This makes the task of 
measuring traffic more problematic every day, especially when off-the-shelf hardware is 
used. A general solution adopted by the computer industry to achieve better performance 
is to partition the processing among different computing units, exploiting the implicit or 
explicit parallelism available on today workstations. P ... 

Keywords: high performance, scalability, software tools 




''^ Search pot pourri: Efficient search en g ine measurements Q 

#Ziv Bar-Yossef, Maxim Gurevich 
May 2007 Proceedings of the 16th international conference on World Wide Web 

WWW '07 
Publisher: ACM Press 

Full text available: ^pdf (271.45 KB) Additional Information: full citation , abstract , references , index terms 

We address the problem of measuring global quality met-rics of search engines, like 
corpus size, index freshness, anddensity of duplicates in the corpus. The recently 
proposedestimators for such metrics [2, 6] suffer from significant blasand/or poor 
performance, due to Inaccurate approximationof the so called .document degrees.. We 
present two new estimators that are able to overcomethe bias introduced by approximate 
degrees. Our estlmatorsare based on a careful Implementation of an approximat ... 

Keywords: corpus size estimation, evaluation, search engines 
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Venkata Sudhakar Reddy Ch, Banshi. D. Chaudhary 

December 2006 Proceedings of the 2006 IEEE/ WIC/ ACM International Conference on 
Web Intelligence WI '06 

Publisher: IEEE Computer Society 

Full text available: ^pdf( 1 97.63 KB ) Additional Information: full citation , abstract , index terms 

This paper reports a query probing strategy which exploits concept hierarchy of Open 
Directory Project (ODP) to discover knowledge about search engines. In this strategy, 
keywords are selected on the basis of frequency analysis of words appearing in 
descriptions of URLs associated with a concept. The selected keywords, their senses and 
the words adjacent to these keywords are used in construction of query phrases. Each 
search engine is probed with these query phrases and their first page result ... 
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16 Research track papers: Minin g templates from search result records of search 
eng ines 

Hongkun Zhao, WeiyI Meng, Clement Yu 

August 2007 Proceedings of the 13th ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '07 

Publisher: ACM Press 

Full text available: g pdf(972.30 KB) Additional Information: full citation , abstract , references , index terms 

Metasearch engine, Comparison-shopping and Deep Web crawling applications need to 
extract search result records enwrapped in result pages returned from search engines in 
response to user queries. The search result records from a given search engine are 
usually formatted based on a template. Precisely identifying this template can greatly help 
extract and annotate the data units within each record correctly. In this paper, we 
propose a graph model to represent record template and develop a dom ... 

Keywords: Information extraction, search engine, wrapper generation 




17 Web resource crawlin g and searchin g : Identify ing redundant search e ngines in a very [~] 

^ larg e scale metasearch en g ine context 

^ Ronak Desai, Qi Yang, Zonghuan Wu, Weiyi Meng, Clement Yu 

Novenfiber 2006 Proceedings of the eighth ACM international workshop on Web 
information and data management WIDM '06 

Publisher: ACM Press 

Full text available: ^ pdf(299.08 KB) Additional Information: full citation , abstract , references , index terms 

For a given set of search engines, a search engine is redundant if its searchable contents 
can be found from other search engines in this set. In this paper, we propose a method to 
Identify redundant search engines In a very large-scale metasearch engine context. The 
general problem is equivalent to an NP hard problem -- the set-covering problem. Due to 
the large number of search engines that need to be considered and the large sizes of 
these search engines, approximate solutions must be develop ... 

Keywords: redundant search engine identification, set-covering problem 
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Frank McCown, Michael L. Nelson 

June 2007 Proceedings of the 2007 conference on Digital libraries JCDL '07 

Publisher: ACM Press 

Full text available: Q pdf( 308.76 KB ) Additional Information: full citation , abstract , refere nces, indexlerms 

Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application 
programming interfaces (APIs) to their collections. Whether building collections of 
resources or studying the search engines themselves, the search engines request that 
researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence 
suggests the interfaces produce different results. We provide the first in depth 
quantitative analysis of the results produced by the Google, MSN and Yahoo API and ... 

Keywords: API, distance measurement, search engine interfaces, search engine results 
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Geographic web search engines allow users to constrain and order search results in an 
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intuitive manner by focusing a query on a particular geographic region. Geographic search 
technology, also called local search, has recently received significant interest from major 
search engine companies. Academic research in this area has focused primarily on 
techniques for extracting geographic knowledge from the web. In this paper, we study the 
problem of efficient query processing in scalable geogr ... 

20 Business-tO'business e-commerce track: The impact of search en g ine o ptimi zatio n I I 

^ on online advertisin g market 
^ Bo Xing, Zhangxi Lin 

August 2006 Proceedings of the 8th international conference on Electronic 

commerce: The new e-commerce: innovations for conquering current 
barriers, obstacles and limitations to conducting successful business on 
the internet ICEC '06 
Publisher: ACM Press 

Full text available: ^pdf( 612.09 KB ) Additional Information: full citation , a bstract , references , indexjerms 

Online advertising market is becoming a popular area of academic research. Among other * 
types of advertising, search engine advertising Is leading the growth In terms of revenue. 
In general, there are two types of search engine advertising: paid placement and search 
engine optimization (SEO). This study aims to analyze the condition under which SEO 
exist and further, its Impact on the advertising market. With an analytical model, several 
interesting insights are generated. The results of the stud ... 

Keywords: online advertising,, paid placement, search engine, search engine marketing, 
search engine optimization, sponsored links 
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Querying and web: Efficient query processin g in geogra phic web search en g ines 
Yen-Yu Chen, Torsten Sue!, Alexander Markowetz 

June 2006 Proceedings of the 2006 ACM SIGMOD international conference on 
Management of data SIGMOD '06 

Publisher: ACM Press 

Full text available: ^ pdf ( 296.76 KB) Additional Information: full citation , abstract , references , index terms 

Geographic web search engines allow users to constrain and order search results in an 
intuitive manner by focusing a query on a particular geographic region. Geographic search 
technology, also called local search, has recently received significant interest from major 
search engine companies. Academic research in this area has focused primarily on 
techniques for extracting geographic knowledge from the web. In this paper, we study the 
problem of efficient query processing in scalable geogr ... 

Coverage, relevance, and rankin g : The impact of quer y o oerators on Web search 
eng ine results 

Caroline M. Eastman, Bernard J. Jansen 

October 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 4 
Publisher: ACM Press 

Additional Information: fuH citation , abstract , referenc es, citings, index 
t erms , review 



Full text available: ^.pdf(37M0KB) 



Research has reported that about lO&percnt; of Web searchers utilize advanced query 
operators, with the other 90&percnt; using extremely simple queries. It is often assumed 
that the use of query operators, such as Boolean operators and phrase searching, 
improves the effectiveness of Web searching. We test this assumption by examining the 
effects of query operators on the performance of three major Web search engines. We 
selected one hundred queries from the transaction log of a Web search servic ... 

Keywords: Boolean operators, Relative precision, Web results, coverage, query 
operators, ranking, search engines 



Trans parent Queries: investi g ation users' mental models of search en gines 
Jack Muramatsu, Wanda Pratt 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval SIGIR '01 

Publisher: ACM Press 
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Typically, commercial Web search engines provide very little feedback to the user 
concerning how a particular query is processed and interpreted. Specifically, they apply 
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key query transformations without the users knowledge. Although these transformations 
have a pronounced effect on query results, users have very few resources for recognizing 
their existence and understanding their practical importance. We conducted a user study 
to gain a better understanding of users knowledge of and reac ... 

Learning search en g ine specific query transformations for question answerin g 
Eugene Agichtein, Steve Lawrence, Luis Gravano 

April 2001 Proceedings of the 10th international conference on World Wide Web 
WWW '01 

Publisher: ACM Press 

Full text available: ^ pdf(205.68 KB) Additional Information: full citation , references , citings, index terms 



Keywords: Information retrieval, query expansion, question answering, web search 

5 Research papers: streams and pi pelined processin g : QPipe: a simultaneousl y 
<^ pi pelined relational query en gine 

^ Stavros Harizopoulos, Vladislav Shkapenyuk, Anastassia Ailamaki 

June 2005 Proceedings of the 2005 ACM SIGMOD international conference on 

Management of data SIGMOD '05 
Publisher: ACM Press 

Full text available: ^ pdf( 506.36 KB) Additional Information: full citation , abstract , references , citings 

Relational DBMS typically execute concurrent queries independently by invoking a set of 
operator instances for each query. To exploit comnnon data retrievals and connputation in 
concurrent queries, researchers have proposed a wealth of techniques, ranging from 
buffering disk pages to constructing materialized views and optimizing multiple queries. 
The ideas proposed, however, are inherently limited by the query-centric philosophy of 
modern engine designs. Ideally, the query engine should proactive ... 

6 Indexing and querying: Three-level caching for efficient query processin g in large 
^ Web search en g ines 

^ Xiaohui Long, Torsten Suel 

May 2005 Proceedings of the 14th international conference on World Wide Web 

WWW '05 
Publisher: ACM Press 

^ ... ^ •. u. 01 ^*/o>,o i/DN Additional Information: full citation , abstract, references , citings, index 
Full text available: 'gpdf (243.61 KB ) ^^^^^ 

Large web search engines have to answer thousands of queries per second with 
interactive response times. Due to the sizes of the data sets involved, often in the range 
of nnultiple terabytes, a single query may require the processing of hundreds of 
megabytes or more of index data. To keep up with this immense worlcload, large search 
engines employ clusters of hundreds or thousands of machines, and a number of 
techniques such as caching, index compression, and index and query pruning are used to 
im ... 

Keywords: Web search, caching, inverted index 
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Bases, volume ll issue 4 
Publisher: Springer-Verlag New York, Inc. 
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XML has become the lingua franca for data exchange and integration across 
administrative and enterprise boundaries. Nearly all data providers are adding XML import 
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or export capabilities, and standard XML Schemas and DTDs are being promoted for all 
types of data sharing. The ubiquity of XML has removed one of the major obstacles to 
integrating data from widely disparate sources - namely, the heterogeneity of data 
formats. However, general-purpose integration of data across the wide are a also re ... 

Keywords: Data Integration, Data streams, Query processing, Web and databases, XML 



8 Information Retrieval: Predictive cachin g and prefetching of query results in search Q 
^ eng ines 

^ Ronny Lempel, Shiomo Moran 

May 2003 Proceedings of the 12th international conference on World Wide Web 
WWW '03 

Publisher: ACM Press 
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■ 1^ terms 

We study the caching of query result pages in Web search engines. Popular search 
engines receive millions of queries per day, and efficient policies for caching query results 
may enable them to lower their response time and reduce their hardware requirements. 
We present PDC (probability driven cache), a novel scheme tailored for caching search 
results, that is based on a probabilistic model of search engine users. We then use a trace 
of over seven million queries submitted to the search engine A ... 

Keywords: caching, query processing and optimization 



9 Interaction: Investi g atin g the querying and browsin g behavior of advanced search Q 

eng ine users 
Ryen W. White, Dan Morris 

July 2007 Proceedings of the 30th annual international ACM SIGIR conference on 

Research and development in information retrieval SIGIR '07 
Publisher: ACM Press 

Full text available: Qpdf (316.63 KB ) Additional Information: full citation , abstract , references , index terms 

One way to help all users of commercial Web search engines be more successful in their 
searches is to better understand what those users with greater search expertise are 
doing, and use this knowledge to benefit everyone. In this paper we study the interaction 
logs of advanced search engine users (and those not so advanced) to better understand 
how these user groups search. The results show that there are marked differences in the 
queries, result clicks, post-query browsing, and search succes ... 

Keywords: advanced search features, expert searching, query formulation, query syntax 
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interfaces lUI '01 
Publisher: ACM Press 
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With the exponential growth of information on the Internet, current information 
integration systems have become more and more unsuitable for this "Internet age" due to 
the great diversity among sources. This paper presents a constraint-based query user 
interface model, which can be applied to the construction of dynamically generated 
adaptive user interfaces for meta-search engines. 
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November 2000 Proceedings of the fifth international worlcshop on on Information 
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Full text available: Qpdf(736.31 KB) Additional Information: full citation , abstract , references , citings 

With the worldwide growth of the Internet, research on Cross-Language Information 
Retrieval (CLIR) Is being paid much attention. Existing CLIR approaches based on query 
translation require parallel corpora or comparable corpora for the disambiguation of 
translated query terms. However, those natural language resources are not readily 
available. In this paper, we propose a disambiguation method for dictionary-based query 
translation that is independent of the availability of such scarce langua ... 

Keywords: WWW, cross-language information retrieval, mutual Information, search 
engine 



12 Indexin g and quer ying : Samplin g search-en g ine results 
^ Aris Anagnostopoulos, Andrei Z. Broder, David Carmel 

>/ May 2005 Proceedings of the 14th international conference on World Wide Web 
WWW '05 

Publisher: ACM Press 
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We consider the problem of efficiently sampling Web search engine query results. In turn, 
using a small random sample Instead of the full set of results leads to efficient 
approximate algorithms for several applications, such as: 

• Determining the set of categories in a given taxonomy spanned by the search 
results; 

• Finding the range of metadata values associated to the result set in order to enable 
"multi-faceted search;" 

• Estimating the size of the result set; 

• Data ... 

Keywords: WAND, sampling, search engines, weighted AND 
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June 2007 Proceedings of the 2007 ACM SIGMOD international conference on 

Management of data SIGMOD '07 
Publisher: ACM Press 

Full text available: Q pdf(868.78 KB ) Additional Information: full citation , abstract, references , index terms 

A lot of research has been conducted by the database community on methods and 
techniques for efficient XPath processing, with great success. Despite the progress made, 
significant opportunities for optimization of XPath still exist. One key to further 
improvements is to utilize more effectively existing facilities of relational RDBSes for the 
processing of XPath queries. After taking a comprehensive look at such facilities, we 
present techniques for XPath processing that work by identifying t ... 

Keywords: XML, XML reconstruction, XPath, dewey encoding, indices, relational 
databases, schema mapping, structural joins 
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January 2006 ACM Transactions on Information Systems (TOIS), volume 24 issue i 
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Full text available:15| pdf(m69KB) Additional Information: full citation , abstract, references , citings, index 
^ ternns 

This article discusses efficiency and effectiveness issues in caching the results of queries 
subnnitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a 
new caching strategy aimed to efficiently exploit the temporal and spatial locality present 
in the stream of processed queries. SDC extracts from historical usage data the results of 
the most frequently submitted queries and stores them in a static, read-only portion of 
the cache. The remaining entries of the c ... 

Keywords: Caching, Web search engines, multithreading 



15 Search: Deternninin q the user intent of web search en g ine queries 
^ Bernard J. Jansen, Danielle L Booth, Amanda Spink 

^ May 2007 Proceedings of the 16th international conference on World Wide Web 
WWW '07 

Publisher: ACM Press 

Full text available: ^pdf(195.39 KB ) Additional Information: full citation. ajDstract, references , index terms 

Determining the user intent of Web searches is a difficult problem due to the sparse data 
available concerning the searcher. In this paper, we examine a method to determine the 
user intent underiying Web search engine queries. We qualitatively analyze samples of 
queries from seven transaction logs from three different Web search engines containing 
more than five million queries. From this analysis, we identified characteristics of user 
queries based on three broad classifications of user inte ... 

Keywords: search engines, user intent, web queries, web searching 
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^ Christopher Jermaine, Subramanian Arumugam, Abhijit Pol, Alin Dobra 

June 2007 Proceedings of the 2007 ACM SIGMOD international conference on 

Management of data SIGMOD '07 
Publisher: ACM Press 

Full text available: 'g) pdf( 369.20 KB ) Additional Information: full citation , abstract , references , index terms 

This paper describes query processing In the DBO database system. Like other database 
systems designed for ad-hoc, analytic processing, DBO is able to compute the exact 
answer to queries over a large relational database in a scalable fashion. Unlike any other 
system designed for analytic processing, DBO can constantly maintain a guess as to the 
final answer to an aggregate query throughout execution, along with statistically 
meaningful bounds for the guess's accuracy. As DBO gathers more and ... 

Keywords: DBO, online aggregation, randomized algorithms, sampling 
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Previous research has shown that researchers can generate medical hypotheses by using 
computers to analyze several, seemingly unrelated, medical literatures. In this work we 
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suggest broader application for the ideas of literature-based discovery. Specifically, we 
suggest that literature-based discovery can be fruitful in areas other than medicine; that 
in addition to finding "cures" for "problems," literature-based discovery offers the 
possibility of finding new problems for existing technologie ... 

Keywords: Literature-based discovery 
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accessibilit y: Mining search en g ine query lo g s for quer y recomm endation 
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WWW '06 
Publisher: ACM Press 
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This paper presents a simple and intuitive method for mining search engine query logs to 
get fast query recommendations on a large scale industrial strength search engine. In 
order to get a more comprehensive solution, we combine two methods together. On the 
one hand, we study and model search engine users' sequential search behavior, and 
interpret this consecutive search behavior as client-side query refinement, that should 
form the basis for the search engine's own query refinement proc ... 



Keywords: mining, query logs, recommendation, session 
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We investigate the idea^of finding semantically related search engine queries based on 
their temporal correlation; in other words, we infer that two queries are related if their 
popularities behave similarly over time. To this end, we first define a new measure of the 
temporal correlation of two queries based on the correlation coefficient of their frequency 
functions. We then conduct extensive experiments using our measure on two massive 
query streams from the MSN search engine, revealin ... 

Keywords: query stream analysis, search engines, semantic similarity among queries 
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